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Abstract 

Background: Recently, one of the greatest challenges in genome-wide association studies is to detect gene-gene 
and/or gene-environment interactions for common complex human diseases. Ritchie et al. (2001) proposed 
multifactor dimensionality reduction (MDR) method for interaction analysis. MDR is a combinatorial approach to 
reduce multi-locus genotypes into high-risk and low-risk groups. Although MDR has been widely used for case- 
control studies with binary phenotypes, several extensions have been proposed. One of these methods, a 
generalized MDR (GMDR) proposed by Lou et al. (2007), allows adjusting for covariates and applying to both 
dichotomous and continuous phenotypes. GMDR uses the residual score of a generalized linear model of 
phenotypes to assign either high-risk or low-risk group, while MDR uses the ratio of cases to controls. 

Methods: In this study, we propose multivariate GMDR, an extension of GMDR for multivariate phenotypes. Jointly 
analysing correlated multivariate phenotypes may have more power to detect susceptible genes and gene-gene 
interactions. We construct generalized estimating equations (GEE) with multivariate phenotypes to extend 
generalized linear models. Using the score vectors from GEE we discriminate high-risk from low-risk groups. We 
applied the multivariate GMDR method to the blood pressure data of the 7,546 subjects from the Korean 
Association Resource study: systolic blood pressure (SBP) and diastolic blood pressure (DBF). We compare the 
results of multivariate GMDR for SBP and DBP to the results from separate univariate GMDR for SBP and DBP, 
respectively. We also applied the multivariate GMDR method to the repeatedly measured hypertension status from 
5,466 subjects and compared its result with those of univariate GMDR at each time point. 

Results: Results from the univariate GMDR and multivariate GMDR in two-locus model with both blood pressures 
and hypertension phenotypes indicate best combinations of SNPs whose interaction has significant association 
with risk for high blood pressures or hypertension. Although the test balanced accuracy (BA) of multivariate 
analysis was not always greater than that of univariate analysis, the multivariate BAs were more stable with smaller 
standard deviations. 

Conclusions: In this study, we have developed multivariate GMDR method using GEE approach. It is useful to use 
multivariate GMDR with correlated multiple phenotypes of interests. 
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Background 

Genome-wide association studies (GWAS) have been 
successfully conducted to detect disease susceptibility 
genes for common complex human diseases by focusing 
on associations between single-nucleotide polymorph- 
isms (SNPs) and phenotypes [1]. While traditional meth- 
ods for GWAS consider only one SNP at a time, some 
common complex human diseases such as diabetes, 
hypertension, and various types of cancers are knovm to 
be influenced by multiple genetic variants [2]. In addi- 
tion, one of the greatest challenges in GWAS is to dis- 
cover gene-gene and/or gene-environment interactions. 

Classic logistic regression can be used to analyze the 
gene-gene interaction [3]. However, logistic regression 
suffers from an overfitting problem in high-order inter- 
actions [4]. Multifactor dimensionality reduction (MDR) 
method is a nonparametric, model-free, and combina- 
torial approach for interaction analysis by identification 
of a multi-locus model for association in case-control 
studies [5-9]. MDR method reduces multi-locus geno- 
types into two disease risk groups: high-risk and low- 
risk groups. If the ratio of cases and controls in a com- 
bination of genotypes is larger than a pre-assigned 
threshold T (e.g., T = 1), the cell of combination is 
labelled as "high risk", otherwise, "low risk". MDR 
method shows greater power for testing high-order 
interactions compared with logistic regression analysis 
[10]. Several statistical methods have been extended 
from MDR approach [11-16]. One of the extended 
methods of MDR is a generalized MDR (GMDR) pro- 
posed by Lou et al. [16]. GMDR method allows adjust- 
ing for covariates and applying to both dichotomous 
and continuous phenotypes; it uses the score-based sta- 
tistic obtained from generalized linear model of pheno- 
types on the predictor-variable and covariates instead of 
the ratio of cases and controls in original MDR method. 

These GWAS methods are generally implemented in a 
univariate framework analysing one phenotype at a time 
even though multiple phenotypes of interest are col- 
lected from a study population. In particular, pleiotropy 
that occurs due to potential genetic correlation between 
multiple phenotypic traits plays a role in pathogenesis of 
correlated human diseases [17]. Jointly analysing corre- 
lated multivariate phenotypes may have more power to 
detect susceptible genes and gene-gene interactions by 
using more information from data. Classic multivariate 
methods such as likelihood based mixed effects model 
[18,19] and generalize estimating equations (GEE) [20], 
and extended versions of these methods [21,22] can be 
applied to multivariate phenotypes of GWAS. 

In this study, we have proposed multivariate GMDR 
method by extending GMDR method for the multivariate 
phenotypes. We construct GEE model with multivariate 



phenotypes to extend generalized linear models. The 
GEE approach is exceptionally useful method for the ana- 
lysis of longitudinal data, especially when the response 
variable is discrete [23]. Using the score vectors from 
GEE, we discriminate high-risk from low-risk groups. 
The proposed multivariate GMDR method can also han- 
dle the repeatedly measured phenotypes. 

We apply the proposed multivariate GMDR method 
to the Korean Association Resource study on blood 
pressure: systolic blood pressure (SBP) and diastolic 
blood pressure (DBP). A number of authors have 
investigated the genome-wide association studies on 
blood pressure and hypertension for Korean popula- 
tion [24-26] and for others [27-30]. However, not 
much work has been done for gene-gene interaction 
analyses. We compare the results of multivariate 
GMDR for SBP and DBP to the results from original 
univariate GMDR for SBP and DBP, respectively. We 
also apply the multivariate GMDR method to the 
repeated measured hypertension phenotypes and com- 
pare its result with those from univariate GMDR at 
each time point. 

Methods 
Multivariate GMDR 

We introduce the generalized estimating equation (GEE) 
regarding a multivariate version of generalized linear 
model (GLM) which is implemented in GMDR. Let 
Yi = (yii/ • • ■ / Yit)^ be the t x 1 vector of the phenotypes 
for subject / (i = 1, • • • ,n), with expectation E(Yit) = ixu. 
For the multivariate phenotype vector, Yi, we assume an 
underlying generalized linear model which can be writ- 
ten as 

1i=g{\i'i) =XiP +ZiY, 

where g( ) denotes a known one-to-one link function 
that is allowed to change with the characteristics of the 
different types of phenotype Yi- Xi and Z,- represent 
design matrices of genotype values and known covariate 
values including the unit component, respectively, and j8 
and y are vectors of their corresponding parameters, 
respectively. We assume that Yit has a probability distri- 
bution belonging to the exponential family of distribu- 
tions formed as 

/ {Yiu Oil, </>) = exp { [Yit9it - b iOit)] /<j) + c(yiu <t>)} ■ 

The GEE estimators of 8 = [P^, y^) for marginal mod- 
els can be obtained from the solution of a set of follow- 
ing generalized estimating equations: 

^w=i:(^)V{n-^.w}=0' 
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where d/ij/dS is a matrix of derivatives whose hth col- 
umn is dfij/dSh. Vi is constructed as V, = (pB^^^R (a) b\^^ , 
where Bj = diag (b" {On)) is a diagonal matrix with main 
diagonal elements of variance function, fj " {Oit)> and R is 
a correlation matrix. V, and R are "working" covariance 
and correlation to distinguish them from the true covar- 
iance and correlation among Y,, respectively. When we 
use canonical link function, dOi/diJi is the identity 
matrix. Let Q be the matrix of predictor values with X,- 
and Z, for subject i. By the chain rule, 



dS 



dfii dOi drii 
9^ 9^ "93" 



BiliQ. 



Then the score equations for 8 are 

n 



i=l 



The expression, B/VJ"^ {y, — can be written as a 

vector of the residual of each phenotype, Yit. Thus, the 
residual score vector for individual i is defined as: 



where /t, = (■^iy) and y is estimator obtained from 
estimating equations under the null hypothesis 
B,. B, and V; are calculated using //,;. Based on this resi- 
dual score vector, each individual with phenotypes is 

discriminated between case and control status. From the 

residual score vector for individual, we propose the 

t 

aggregation for elements of the score vector, S; = ^ Sy, 

i=i 

and use that as a prediction score for each individual. If 
the sum of prediction scores over those individuals who 
have the corresponding genotype combination is greater 
than a threshold value, assign 'high-risk' to the cell cor- 
responding to the genotype combination. Otherwise, 
assign 'low-risk' to the cell. 

Data 

Our primary outcomes are two types of blood pressure, 
systolic blood pressure (SBP) and diastolic blood pressure 
(DBP), and hypertension diagnosis of the Korean Associa- 
tion REsource (KARE) Consortium. The measurements of 
blood pressure were dichotomized at 140 mmHg for SBP 
and 90 mmHg for DBP, and denoted by SBPg and DBPb, 
respectively. We defined the hypertensive case as HP = 1 
if SBP > 140 mmHg or DBP > 90 mmHg, and HP = 0, 
otherwise. Several genome-wide association studies 
(GWAS) have been performed on blood pressure by 



treating blood pressure as a quantitative trait [24-29] . In 
this study, we treated blood pressure as a binary trait HP 
representing whether the hypertension status is yes or no. 
Among 8,842 KARE subjects, 1,291 subjects were removed 
in the analysis due to anti-hypertensive therapy and drug 
treatments that could influence blood pressure. Addition- 
ally, 5 subjects were excluded because of missingness in 
SBP and body mass index (BMI). Of the 7,546 subjects 
considered in the study, 4,080 (54%) subjects were from 
urban community Ansan and the others were from rural 
community Ansung. For the study, the average age is 48.4 
years for Ansan and 55.0 years for Ansung. There are 
three times of bi-yearly measured hypertensive status fi-om 
2001 to 2006, denoted by HPi, HP2, and HP3. Among 
7,546 subjects, 2,080 subjects did not follow up at time 2 
or 3. Subject characteristics are summarized in Table 1. 
The genomic DNAs were genotyped using Affymetrix 
Genome- Wide Human SNP Array 5.0. The quality control 
procedures were adopted such as missing genotype fre- 
quency > 0.5% and minor allele frequency (MAF) < 0.01 at 
least on area. Finally a total of 7,546 individuals and 
344,596 SNPs were included in the analysis of dichoto- 
mized SBPb and DBPb, whUe a total of 5,466 individuals 
and 344,309 SNPs were included in the analysis of repeat- 
edly measured hypertension status. 



Table 1 Subject characteristics of the KARE. 



Phenotype 




N(=7,546) 


% 


Recruit area 










Ansung 


3,466 


45.9 




Ansan 


4,080 


54.1 


Gender 










Male 


3,743 


49.6 




Female 


3,803 


504 


Systolic blood pressure 










> 140 


701 


9.3 




< 140 


6,845 


90.7 


Diastolic blood pressure 










> 90 


693 


9.2 




< 90 


6,853 


90.8 


Age (years) 




Mean 


SD 




Overall 


514 


8.79 




Ansung 


55.0 


8.82 




Ansan 


48.4 


7.51 


Body mass index (kg/m^) 










Overall 


24.4 


3.08 


Hypertensive cases 




N*(=5,466) 


% 


(SBP > 140 or 


HPi (Time 1) 


716 


13.1 


DBP > 90) 


HP2 (Time 2) 


706 


12.9 




HP3 (Time 3) 


698 


12.8 



Abbreviations: DBP, diastolic 
Resource; SBP, systolic blood 



blood pressure; KARE, 
pressure. 



Korean Association 
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Results 

Preliminary analyses 

To compare multivariate analysis with univariate analy- 
sis, we first separately fit a logistic regression model for 
each dichotomized blood pressure measurement SBPg 
and DBPb with covariate adjustment for recruitment 
area, age, sex, and BMI. The correlation between SBPg 
and DBPb is 0.48. The multivariate analysis with two 
binary phenotypes (SBPb, DBPb) was conducted using 
the GEE approach. For the repeatedly measured hyper- 
tension status HPi, HP2, and HP3, we fit logistic models 
for each HPj and fit the GEE model for three HPs 
simultaneously. The pairwise correlations range from 
0.32 to 0.36. In the GEE model, we assumed two types 
of genetic effect: homogeneous genetic effect and het- 
erogeneous genetic effect for multivariate phenotypes. 
However, when we compared the effect sizes and p- 
values of homogeneous model with those of heteroge- 
neous model, there was no strong evidence for support- 
ing the homogeneous genetic effect. So, we present the 
results of the GEE model with heterogeneous genetic 
effects for multivariate phenotypes in both of blood 
pressures and repeatedly measured hypertension status. 

To perform gene-gene interaction analysis using 
GMDR analyses, we first selected SNPs with strong 
marginal effects in univariate models and among those, 
we select the ones with strong effects in multivariate 
models. For SBPb and DBPb analysis, we selected the 
top 50 SNPs for each SBPb and DBPb. From these 100 
SNPs, we chose 35 SNPs using a p-value criterion (< 1 
X 10'*) from the GEE model. In a similar manner, we 
chose 34 SNPs for HPi, HP2, and HP3 by selecting the 
top 50 SNPs for each HPj using the same p-value criter- 
ion from their GEE model. 

Univariate logistic and multivariate GEE analyses of SBPb 
and DBPb 

We report results of GWA studies of dichotomized SBPb 
and DBPb, and their multivariate analyses. For SBPb and 
DBPb, the Manhattan plots are given in Figure 1. As 
summarized in Table 2, five SNPs for SBPb (rsl549022, 
rs2111464, rsl2942470, rs2088983, and rsl768145) and 
three SNPs for DBPb (rsl7045441, rsl 1866964, and 
rs7555790) were selected at the 10'^ significance level. 
For multivariate GEE analysis for (SBPb, DBPb), six SNPs 
were selected: rsl7045441, rsl378942, rsl2942470, 
rsl549022, rs927833, and rs2111464. Among these six 
SNPs selected from multivariate GEE analysis, four SNPs 
were also found by univariate analysis but two SNPs 
(rsl378942 and rs2111464) were not. A gene CSK in 
which SNP rsl378942 is located has been reported as a 
hypertension susceptibility gene in the Korean population 
[25,26] and also in East Asians [30]. 



Univariate logistic and multivariate GEE analyses of HP^, 
HP2, and HP3 

We performed association analysis for the repeatedly 
measured binary hypertension phenotypes HPi, HP2, 
and HP3. First, the logistic regression model was fit for 
each HPj and multivariate analysis for (HPi, HP2, and 
HP3) was performed by GEE model. Manhattan plots 
are given in Figure 2. Nineteen SNPs were selected at 
10'^ significance level (Table 3): four for HPi 
(rsl7675997, rs2411259, rs4084097, and rs7751214), five 
for HP2 (rs4908736, rsl7677051, rs4867707, rs550214, 
and rsll636344), and seven for HP3 (rs294082, 
rs4495407, rsl0956596, rs6470947, rs4615555, 
rs4279577, and rs7465333), and three for multivariate 
HPs (rsl2054837, rs4084097, and rsl7722281). However, 
none of the identified SNPs were commonly observed 
by all three univariate analyses (Table 3). It might be 
due to the fact that the status of subject with hyperten- 
sion is very volatile over time (Table 4) even though the 
proportion of hypertension risk was stable over time 
(Table 1). Thus the signals of association with hyperten- 
sion were differently expressed over time. Among three 
SNPs from multivariate analysis, SNP rs4084097 was 
also associated with hypertension by univariate analysis 
at time 1. However, there were no common SNPs 
between multivariate GEE analysis and univariate ana- 
lyses at times 2 and 3. One hypertension SNP at time 2, 
rsll636344, in FBNl gene and another SNP rsl7722281 
of WWOX gene from multivariate have been previously 
found to be associated with hypertension in China 
population [31,32]. 

Univariate GMDR and multivariate GMDR analyses of 
SBPb and DBPb 

We present GMDR results to discover gene-gene and/or 
gene-environment interactions. For univariate GMDR 
analysis, logistic regression models for dichotomized 
SBPb and DBPb were constructed with area, age, sex, 
and BMI as covariates under the null hypothesis of no 
genetic effect. For multivariate GMDR analysis, the GEE 
model with same covariates was constructed. To reduce 
the computational burden, we focused on 35 SNPs 
selected from the preliminary analysis. All possible one 
and two locus models were fit for 35 SNPs. Through 
10-fold-cross validation the best combination of loci 
with maximum train balanced accuracy (BA) which is 
average of sensitivity and specificity was chosen at each 
fold. To choose the final model, we considered cross- 
validation consistency (CVC) among a set of best 
combinations. 

Table 5 summarizes the best model. Train BA, Test 
BA, and CVC from univariate GMDR and multivariate 
GMDR. For the purpose of comparison, we computed 
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Figure 1 Manliattan plots of SBPb and DBPb in univariate and multivariate analyses. (A) SBPb in logistic regression (B) DBPb in logistic 
regression (C) Multivariate model with SBPb and DBPb (D) Overlay plot of (A)-(C). 



Table 2 Selected SNPs of SBP and DBP from univariate and multivariate analyses. 



CHR 


SNP 


Gene symbol 




SBP 




DBP 




Multivariate 




Beta 


P-value 


Beta 


P-value 


Betal 


Beta2 


P-value 


1 


r57555790 


PEX14 


0.117 


416E-03 


0.184 


4.46E-06 


0.046 


0.116 


2.35E-05 


2 


rs21 11464 




0.200 


1.11E-06 


0.100 


1 .28E-02 


0.293 


0.195 


8.77E-06 


2 


rsl 549022 




0.207 


6.52E-07 


0.111 


5.89E-03 


0.295 


0.202 


5.23E-06 


3 


rs 1768145 




0.169 


8.24E-06 


0.090 


2.01 E-02 


0.233 


0.161 


7.95 E-05 


4 


rsl 7045441 


ANK2 


0.065 


1.06E-01 


0.199 


7.69E-08 


-0.090 


0.058 


3.91 E-08 


4 


rs2088983 




0.168 


6.96E-06 


0.090 


1 .82E-02 


0.234 


0.162 


4.54E-05 


15 


rsl 378942 


CSK 


-0.189 


2.50E-05 


-0.192 


1.85E-05 


-0.167 


-0.182 


3.49E-06 


16 


rsl 1866964 


ZNF423 


-0.089 


3.66E-02 


-0.206 


2.78E-06 


0.036 


-0.087 


3.26E-05 


17 


rsl 2942470 




0.186 


4.36E-06 


0.041 


3.12E-01 


0.326 


0.180 


4.25E-06 


20 


rs927833 


10000270679 


-0.127 


2.31 E-02 


0.074 


4.53E-02 


-0.343 


-0.130 


7.43E-06 
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Figure 2 Manhattan plots of longitudinal hypertension data in univariate and multivariate analyses (A) HP, in logistic regression (B) HP2 
in logistic regression (C) HP3 in logistic regression (D) Multivariate model with longitudinal hypertension (E) Overlay plot of (A)-(D). 



the p-values from the logistic models and GEE model 
for the SNPs in one-locus model of GMDR methods. 
The identified SNPs by GMDR methods also have sig- 
nificant p-values from these analyses: 6.51E-07 for SBPb, 
4.21E-05 for DBPb, and 3.26E-05 for multivariate pheno- 
types. The best two-locus model of DBPb included one 
SNP, rs 1378942, in CSK and another SNP, rs 11 866964, 
in ZNF423 implying that the interaction between CSK 
and ZNF423 genes was identified as a significant contri- 
butor to dichotomized DBPb. The test BAs of the one- 
locus models (two-locus model) for these SNPs were 
0.545 and 0.549 (0.566) for rsl378942 and rsll866964. 



respectively. The best two-locus model from the multi- 
variate GMDR included rs7555790 in PEX14 gene and 
rsll077135 in A2BP1 gene. The test BA of the one- 
locus models (two-locus model) for these SNPs were 
0.526 and 0.532 (0.546), respectively. It seems that the 
contribution was from the joint effects of two genes 
rather than their main effects. The graphical descrip- 
tions for test BA are given in Figure 3. The median of 
test BA for multivariate GMDR is between median of 
SBPb and DBPb in both one and two-locus models. The 
distribution of test BA for multivariate GMDR is more 
concentrated than those of SBPb and DBPb. 
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Table 3 Selected SNPs of longitudinal hypertension from univariate and multivariate analyses. 



CHR 


SNP 


Gene symbol 




HPi 




HP2 




HP3 




Multivariate 




Beta 


P'Value 


Beta 


P-value 


Beta 


P-value 


Betal 


Beta2 


Beta 3 


P-value 


1 
1 


rS4yUo / DO 




Dill 

U. 1 1 1 


O.UzL-U3 


n 1 7Q 
U. 1 /o 


0.0.3 t-Uo 


n n70 

u.u/y 


Q 1 7P m 


n 1 1 n 
U. 1 1 U 


n 1 7Q 
u. 1 /o 


n n7Q 
u.u/y 


\ .z 1 t-U4 


4 


rs 1 /D/byy/ 




0.1 76 


o.lob-Oo 


0.051 


2.1 1 E-01 


0.049 


z.z/L-U 1 


0.1 75 


U.UD 1 


U.U40 


1 .27E-04 


4 


1-1-1/11 1 Tcn 


LULidzd/o 


0.176 


5.33 E-06 


0.051 


2. 06 E-01 


0.061 


1 .28E-01 


0.1 74 


U.Uj I 


U.UO 1 


1 .1 3E-04 


5 


rr- 1 TnC /I "7 

rs 1 AjDHoi / 




-0.029 


4.83E-01 


-0.042 


T 1 or n 1 

i. \ ob-U 1 


0.1 62 


2.1 2E-05 


-0.031 


-U.U43 


n 1 

u. 1 00 


2.95E-06 


r 
J 


rcTQ/iriD') 

rSzy4Uoz 




n nA7 


1 -Uzt-U 1 


n nQ7 
U.Uo/ 


J.zot-Uz 


U. 1 0 1 


c one HA 


U.UOb 


U.Uo/ 


u. 1 0 1 


1 71 P C\A 

\ ./ 1 t-U4 


c 

J 


rs 1 /D/ /Uj I 




-U.UbO 


^ 7QC m 

o./yt-Uz 


n 1 QQ 




-U.Uoy 


0. 1 Dt-Uz 


-U.Uo 1 


n 1 RR 
u. 1 00 


u.uyj 


o.Uyt-Uj 


r 
D 


rS'^ioo/ /u/ 




-U.Uoo 


3.ZZt-Uz 


r\ 1 Qo 

-u. 1 oy 


/.UUt-Uo 


n no 1 
-u.uy 1 


1 77P m 
z./ / L-Uz 


-U.UOD 


n 1 QQ 
-u. 1 00 


-u.uyz) 


D.oUt-Uj 


6 


rs4Uo4Uy/ 




0.1 63 


9,ol b-Oo 


-0.004 


9.27E-01 


0.092 


1 .68E-02 


0.1 58 


-U.UUD 


U.uyj 


o.U5b-0o 


6 


rs/ /D 1 Z 14 


CDUA 7 

trnA/ 


-0.191 


9,1 6 E-06 


-0.009 


8. 26 E-01 


-0.099 


1 .85 E-02 


-0. 1 90 


-0.008 


-0.100 


1 .39E-05 


8 


rs4495407 




0.038 


3.60E-01 


0.012 


7.74E-01 


0.185 


8.40E-06 


0.036 


0.012 


0.189 


5.59E-05 


8 


rsl 0956596 




-0.044 


2.82E-01 


-0.047 


2.58E-01 


-0.185 


8.82E-06 


-0.043 


-0.047 


-0.188 


1.23E-04 


8 


rs6470947 




0.053 


1.94E-01 


0.023 


5.69E-01 


0.187 


6.69E-06 


0.053 


0.023 


0.190 


6.05E-05 


8 


rs4615555 




0.051 


2.17E-01 


0.030 


4.69E-01 


0.191 


3.81 E-06 


0.049 


0.029 


0.194 


3.34E-05 


8 


rs4279577 




0.052 


2.06E-01 


0.031 


4.56E-01 


0.192 


3.44E-06 


0.051 


0.030 


0.196 


3.26E-05 


8 


rs7465333 




0.050 


2.31 E-01 


0.031 


4.57E-01 


0.189 


6.33E-06 


0.048 


0.031 


0.193 


5.75E-05 


11 


rs550214 




0.081 


4.38E-02 


0.175 


6.09E-06 


0.102 


1.01 E-02 


0.077 


0.174 


0.106 


8.32E-05 


15 


rsl 1636344 


FBNl 


0.075 


5.81 E-02 


0.167 


6.51 E-06 


0.035 


3.88E-01 


0.073 


0.166 


0.037 


1.06E-04 


16 


rsl 7722281 


WWOX 


-0.142 


7.68E-04 


-0.160 


1.52E-04 


0.034 


4.16E-01 


-0.140 


-0.161 


0.034 


7.66E-06 



Univariate GIVIDR and multivariate GIVIDR analyses of HP|, 
HP2, and HP3 

The results of the univariate GMDR and multivariate 
GMDR are summarized in Table 6 for the repeatedly 
measured hypertension status HPi, HP2, and HP3. For 
these hypertension phenotypes, 34 SNPs selected from 
the preliminary analysis were included to GMDR 
mechanisms. All possible one and two locus models 
were fit for 34 SNPs. Not surprisingly, all different SNPs 
were identified in one-locus model. For the comparison 
between GMDR methods and classic method of logistic 
and GEE models, we report the p-values from the logis- 
tic models and GEE model for the identified SNPs from 
GMDR methods in one-locus models: 1.02E-05 for HPi, 
1.59E-05 for HP2, 6.33E-06 for HP3, and 8.50E-05 for 
multivariate phenotypes. The identified SNPs by GMDR 
methods also have significant p-values from the classic 
methods. The best two-locus model from multivariate 
GMDR included rs7791839 in CCDC129 gene and 
rs7168365 in WDR72 implying that the interaction 

Table 4 Transition of hypertensive case over time. 

HP, Time 1 (716) 

Hypertension Normal 

HP3 Time 3 (288) HP3 Time 3 (410) 

Hypertension Normal Hypertension Normal 

HPjTlme Hyper- 166 154 147 239 

2 (705) tension 

Normal 122 274 263 4101 

Note that numbers within parentheses are the number of hypertensive case 
at each time point. 



between CCDC129 and WDR72 genes was identified as 
a significant contributor to the repeatedly measured 
hypertension status. Box plots and density plots of test 
BA for GMDR and multivariate GMDR of HPs are 
given in Figure 4. Similar to the results of dichotomized 
SBPb and DBPg, the test BA for multivariate GMDR 
had a smaller deviation in the both one-and two-locus 
models. 

Comparison of univariate GMDR and multivariate GMDR 

We presented the results of univariate and multivariate 
GMDR by the same phenotypes in the previous two 
sub-sessions. However, those comparisons are not sig- 
nificantly meaningful to describe the usefulness of mul- 
tivariate GMDR. Here, we compared the results from 
multivariate GMDR of SBPb and DBPb with the results 



Table 5 Comparison of results for SBP and DBP by GMDR 
and multivariate GIVIDR. 



No. of 
Loci 


IVIethod 


Best model 


Train 
BA 


Test 
BA 


cvc 


1 


GMDR_SBP 


rsl 549022 


0.544 


0.544 


5 




GMDR_DBP 


rsl 10771 35 


0.548 


0.547 


7 




Multivariate 
GMDR 


rsl 1866964 


0.539 


0.536 


8 


2 


GMDR_SBP 


rs21 11464, 
rsl 2942470 


0.566 


0.556 


7 




GMDR_DBP 


rsl 378942, 

rsl 1 866964 


0.566 


0.555 


3 




Multivariate 
GMDR 


rs7555790, 
rsl 10771 35 


0.551 


0.545 


2 
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Figure 3 Graphical descriptions for Test BA from GMDR analyses for SBP and DBP in univariate and multivariate analyses (A) Box plots 
of Test BA in one-locus model (B) Box plots of Test BA in two-locus model (C) Density plots of Test BA in one-locus model (D) Density plots of 
Test BA in two-locus model. 



from the GMDR of HPi including the same individuals 
and candidate SNPs (Table 7). Because hypertension 
was defined by SBPg or DBPg, we can directly compare 
the performance of multivariate GMDR and univariate 
GMDR through those analyses. Multivariate GMDR and 
GMDR yielded the same best two-locus model. 



Table 6 Comparison of results for longitudinal 
hypertension by GMDR and multivariate GMDR. 



No. of 
Loci 


Method 


Best model 


Train 
BA 


Test 
BA 


cvc 


1 


GMDR_ HP, 


rs 11097953 


0.542 


0.543 


9 




GMDR_ HP2 


rsl 11 15097 


0.545 


0.546 


5 




GMDR_ HP3 


rs7465333 


0.540 


0.542 


5 




Multivariate 
GMDR 


rs71 68365 


0.529 


0.528 


9 


2 


GMDR_ HP, 


rsl 1097953, 
rs7751214 


0.555 


0.540 


6 




GMDR_ HP2 


rsl 11 15097, 
rsl 7722281 


0.566 


0.566 


8 




GMDR_ HP3 


rs7791839 
rs6470947 


0.563 


0.563 


9 




Multivariate 
GMDR 


rs7791839, 
rs71 68365 


0.544 


0.544 


7 



However, multivariate GMDR shows slightly better test 
BA than GMDR. Box plots of test BA for multivariate 
GMDR and GMDR from those two analyses are given 
in Figure 5. The test BA of multivariate model has smal- 
lest deviation also. 

Conclusions 

In this paper, we have developed multivariate analysis 
for discovering gene-gene interaction, namely multivari- 
ate GMDR. Our multivariate GMDR analysis was devel- 
oped by utilizing a GEE approach to multivariate 
phenotypes. Many studies emphasized the importance 
and the increase of power for multivariate analysis in 
GWAS [33-35]. Although MDR method has been devel- 
oped in variety of manners [5-9], there have been no 
extensions to the multivariate analysis. We proposed 
multivariate GMDR analysis by utilizing the GEE model 
to calculate the prediction score to be a tool for redu- 
cing the multifactor dimensionality. The GEE approach 
is an extension of generalized linear models to the longi- 
tudinal data and handles both discrete and continuous 
phenotypes. Thus, our multivariate GMDR can be 
applicable to both discrete and continuous phenotypes. 
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Though real GWAS data analysis, we investigated 
the properties of multivariate GMDR. Firstly, the 
result of multivariate GMDR does not always coincide 
with that of GEE approach. That is, the best SNP set 
selected by multivariate GMDR does not always have 
the smallest p-value from GEE model. In our analysis, 
note that the SNP set selected by multivariate GMDR 
still tends to have quite a small p-value. Secondly, the 
test BAs of the multivariate GMDR is not always lar- 
ger than those of univariate GMDR. As shown in Fig- 
ures 3 to 5, the distribution of test BAs from the 
multivariate GMDR is different from those of univari- 
ate GMDR. The test BAs of multivariate GMDR are 



more densely distributed with a smaller standard 
deviation than those of univariate GMDR. Thus, a 
direct comparison of test BAs between multivariate 
GMDR and univariate GMDR may lead a misleading 
conclusion. 

The proposed multivariate GMDR can be extended in 
many different ways. The modified version BAs which 
takes account for the distributional difference is 
expected to improve the performance of multivariate 
GMDR. The testing procedure using the modified BAs 
under the null distribution would enable us to demon- 
strate the increase of power of multivariate GMDR. A 
prediction score is defined as the sum of elements of 



Table 7 Comparison of results for SBP and DBF by multivariate GMDR and hypertension at time 1 (HPi) by GIVIDR. 

No. of Loci Method Best model Train BA Test BA CVC 

1 Multivariate GMDR with BPs rsl 1866964 0.539 0.536 9 
GMDR with HP, rs4811719 0.542 0.541 4 

2 Multivariate GMDR with BPs rs1338574, rs481 1 71 9 0.560 0.557 7 
GMDR with HP, rsl 338574, rs481 171 9 0.560 0.554 7 
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the score vector from GEE model. We are currently 
working on several different weighting schemes for 
accounting various relationships between phenotypes. 
The weighted prediction score is also expected to 
improve the performance of multivariate GMDR. In the 
future studies, all these extensions will be evaluated 
through extensive simulation studies. 
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